Linear Discriminant Analysis

Overview

Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction technique that projects data onto a line in the feature space, maximizing the separation between classes. It is useful for classification tasks where the goal is to find a low-dimensional representation of the data that preserves the class-separability.

Training

Objective Function

LDA and PCA both project data onto a line — but they optimise completely different objectives. Here’s the key contrast:

PCA maximises total variance: \(\mathbf{v}^\top \mathbf{C}\, \mathbf{v}\). It is unsupervised — it doesn’t know about class labels.

LDA maximises the Fisher criterion — the ratio of between-class variance to within-class variance:

\[J(\mathbf{w}) = \frac{\mathbf{w}^\top \mathbf{S}_B\, \mathbf{w}}{\mathbf{w}^\top \mathbf{S}_W\, \mathbf{w}}\]

where the scatter matrices are:

\[\mathbf{S}_B = (\boldsymbol{\mu}_A - \boldsymbol{\mu}_B)(\boldsymbol{\mu}_A - \boldsymbol{\mu}_B)^\top \qquad \text{(push means apart)}\]

\[\mathbf{S}_W = \sum_{k \in \{A,B\}} \sum_{i \in C_k} (\mathbf{x}_i - \boldsymbol{\mu}_k)(\mathbf{x}_i - \boldsymbol{\mu}_k)^\top \qquad \text{(squeeze class clouds)}\]

Optimization

Write \(N(\mathbf{w}) = \mathbf{w}^\top \mathbf{S}_B \mathbf{w}\) and \(D(\mathbf{w}) = \mathbf{w}^\top \mathbf{S}_W \mathbf{w}\), so \(J = N/D\). For symmetric scatter matrices,

\[\nabla_{\mathbf{w}} N = 2\,\mathbf{S}_B \mathbf{w}, \qquad \nabla_{\mathbf{w}} D = 2\,\mathbf{S}_W \mathbf{w}.\]

The quotient rule gives

\[\nabla_{\mathbf{w}} J = \frac{D\,\nabla N - N\,\nabla D}{D^2} = \frac{2}{D^2}\Bigl(D\,\mathbf{S}_B \mathbf{w} - N\,\mathbf{S}_W \mathbf{w}\Bigr).\]

At any stationary point (where \(\nabla_{\mathbf{w}} J = \mathbf{0}\) and \(D > 0\)), therefore

\[\mathbf{S}_B \mathbf{w} = \frac{N}{D}\,\mathbf{S}_W \mathbf{w} = J(\mathbf{w})\,\mathbf{S}_W \mathbf{w}.\]

Assuming \(\mathbf{S}_W\) is invertible,

\[\mathbf{S}_W^{-1} \mathbf{S}_B\, \mathbf{w} = J(\mathbf{w})\, \mathbf{w}.\]

So \(\mathbf{w}\) is an eigenvector of \(\mathbf{S}_W^{-1}\mathbf{S}_B\) and the Rayleigh quotient \(J(\mathbf{w})\) is the corresponding eigenvalue. Maximising \(J\) means taking the eigenvector for \(\lambda_{\max}\).

Since \(\mathbf{S}_B = \mathbf{d}\mathbf{d}^\top\) with \(\mathbf{d} = \boldsymbol{\mu}_A - \boldsymbol{\mu}_B\), so

\[\mathbf{S}_B \mathbf{w} = (\mathbf{d}^\top \mathbf{w})\,\mathbf{d}.\]

The eigen equation becomes \((\mathbf{d}^\top \mathbf{w})\,\mathbf{S}_W^{-1}\mathbf{d} = \lambda \mathbf{w}\), so \(\mathbf{w}\) lies on the same line as \(\mathbf{S}_W^{-1}\mathbf{d}\). LDA only cares about that line (any non-zero scalar multiple gives the same Fisher ratio); taking \(\lambda_{\max}\) and a convenient scale,

\[\mathbf{S}_W^{-1} \mathbf{S}_B\, \mathbf{w}^* = \lambda_{\max}\, \mathbf{w}^* \quad \Longrightarrow \quad \mathbf{w}^* = \mathbf{S}_W^{-1}(\boldsymbol{\mu}_A - \boldsymbol{\mu}_B)\]

(up to an arbitrary non-zero factor; you may instead enforce e.g. \(\mathbf{w}^{*\top}\mathbf{S}_W \mathbf{w}^* = 1\)).

The bottom curve on the interactive demo shows \(J(\mathbf{w})\) across all angles — drag the slider to ride along it and watch how the two projected distributions (right panel) overlap or separate. The green dashed line marks the LDA optimum; hit “Snap” to jump there.

Decision Threshold

Once \(\mathbf{w}^*\) is found, classification is a three-step process:

Step 1 — Project the new point

\[z = \mathbf{w}^{*\top} \mathbf{x}\]

This gives a scalar position on the LDA axis.

Step 2 — Find the decision threshold

The threshold \(z^*\) is the point on the axis that separates the two classes. The standard choice is the midpoint of the projected class means, weighted by prior probabilities:

\[z^* = \frac{n_A \,\tilde{\mu}_A + n_B \,\tilde{\mu}_B}{n_A + n_B}\]

where \(\tilde{\mu}_k = \mathbf{w}^{*\top} \boldsymbol{\mu}_k\) are the projected class means and \(n_k\) are class sizes. With equal priors this simplifies to:

\[z^* = \frac{\tilde{\mu}_A + \tilde{\mu}_B}{2}\]

Step 3 — Assign the class

\[\hat{y} = \begin{cases} A & \text{if } z \geq z^* \\ B & \text{if } z < z^* \end{cases}\]